Skip to main content

Primary Output

all_stocks_fundamental_analysis.json.gz

Primary deliverable containing comprehensive analysis for all 2,775 stocks with 86 fields per stock.
File Details
object
Location: Root directory of pipeline Retention: Permanent (kept after cleanup)

Secondary Outputs

sector_analytics.json.gz

Sector-level aggregations and analytics.
  • Format: JSON (gzip compressed)
  • Generated by: process_market_breadth.py
  • Retention: Permanent

market_breadth.json.gz

Market breadth indicators and relative strength ratings.
  • Format: JSON (gzip compressed)
  • Generated by: process_market_breadth.py, process_historical_market_breadth.py
  • Retention: Permanent

ohlcv_data/

Historical OHLCV (Open, High, Low, Close, Volume) data for all stocks.
Directory Details
object
CSV Format:
Date,Open,High,Low,Close,Volume
2024-01-02,1250.50,1275.80,1245.20,1270.00,1234567
2024-01-03,1272.00,1285.30,1268.00,1280.50,987654
Retention: Permanent (kept after cleanup) Purpose: Required for ADR, RVOL, ATH calculations and earnings performance tracking.

Intermediate Files (Auto-Cleaned)

When CLEANUP_INTERMEDIATE = True (default), the following files are automatically deleted after pipeline completion:

Core Data Files

Maps stock symbols to ISIN codes and security IDs.
  • Size: ~500 KB
  • Used by: All Phase 2 scripts
  • Generated by: fetch_dhan_data.py
Full market snapshot with technical indicators for all stocks.
  • Size: ~15 MB
  • Records: 2,775 stocks
  • Generated by: fetch_dhan_data.py
Quarterly results and financial ratios for all stocks.
  • Size: ~35 MB
  • Records: 2,775 stocks × quarterly data
  • Generated by: fetch_fundamental_data.py
Pivot points, EMA/SMA signals, and technical sentiment.
  • Size: ~8.3 MB
  • Generated by: fetch_advanced_indicators.py
Live corporate announcements and regulatory filings.
  • Size: ~5-10 MB
  • Generated by: fetch_new_announcements.py

Corporate Actions Files

Upcoming dividends, bonus issues, splits, and results dates.
  • Time Range: Next 2 months
  • Generated by: fetch_corporate_actions.py
Historical corporate actions.
  • Time Range: Last 2 years
  • Generated by: fetch_corporate_actions.py

Market Intelligence Files

Bulk and block deals from the last 30 days.
  • Generated by: fetch_bulk_block_deals.py
Stocks hitting circuit limits.
  • Generated by: fetch_circuit_stocks.py
ASM (Additional Surveillance Measure) and GSM (Graded Surveillance Measure) lists.
  • Generated by: fetch_surveillance_lists.py
Daily price band revisions.
  • Generated by: fetch_incremental_price_bands.py
Complete list of all securities with their current price bands.
  • Generated by: fetch_complete_price_bands.py
NSE listing dates for all equity securities.
  • Source: NSE Archives
  • Downloaded via: cURL in run_full_pipeline.py

Directories

Individual filing JSONs for each stock.
  • Files: {SYMBOL}_filings.json (2,775 files)
  • Total Size: ~100-200 MB
  • Content: Top 100 regulatory filings per stock (hybrid from LODR + Legacy endpoints)
  • Generated by: fetch_company_filings.py
Sentiment-analyzed news for each stock.
  • Files: {SYMBOL}_news.json (2,775 files)
  • Total Size: ~50-100 MB
  • Content: Top 50 news items per stock with AI sentiment (positive/negative/neutral)
  • Generated by: fetch_market_news.py

Base JSON (Replaced by .gz)

Uncompressed version of the master output.
  • Size: ~30-40 MB
  • Deleted after: Compression to .json.gz in Phase 5

Cleanup Behavior

Configuration Flag

# In run_full_pipeline.py
CLEANUP_INTERMEDIATE = True  # Default: True

What Gets Kept

Compressed Outputs

  • all_stocks_fundamental_analysis.json.gz
  • sector_analytics.json.gz
  • market_breadth.json.gz

OHLCV Data

  • ohlcv_data/ directory (all CSV files)

What Gets Deleted

  • All intermediate JSON files (13 files)
  • company_filings/ directory
  • market_news/ directory
  • nse_equity_list.csv
  • Uncompressed .json versions

Cleanup Report

The pipeline prints a summary after cleanup:
🗑️  Cleaned: 13 files + 2 dirs (245.3 MB freed)
🧹 Only .json.gz + ohlcv_data/ remain. All intermediate data purged.

File Size Summary

CategorySize (Before Cleanup)Size (After Cleanup)
Compressed Outputs~8-10 MB~8-10 MB
OHLCV Data~500 MB - 2 GB~500 MB - 2 GB
Intermediate Files~200-400 MB0 MB
Total~700 MB - 2.4 GB~500 MB - 2 GB
First run: OHLCV download takes ~30 minutes and generates ~2 GB of historical data.Subsequent runs: OHLCV updates take ~2-5 minutes (incremental sync only).

Standalone Outputs (Optional)

These files are not included in the main pipeline unless FETCH_OPTIONAL = True:

FNO Data

  • fno_stocks_response.json - 207 F&O stocks
  • fno_lot_sizes_cleaned.json - Lot sizes for F&O contracts
  • fno_expiry_calendar.json - Expiry dates for futures and options

Indices & ETFs

  • all_indices_list.json - 194 market indices
  • etf_data_response.json - 361 ETFs
Generated by: fetch_all_indices.py, fetch_etf_data.py
These files are not merged into all_stocks_fundamental_analysis.json and must be processed separately if needed.